Goto

Collaborating Authors

 standard normal distribution


NSNQuant: ADouble Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache

Neural Information Processing Systems

Large Language Model (LLM) inference is typically memory-intensive, especially when processing large batch sizes and long sequences, due to the large size of key-value (KV) cache. Vector Quantization (VQ) is recently adopted to alleviate this issue, but we find that the existing approach is susceptible to distribution shift due to its reliance on calibration datasets. To address this limitation, we introduce NSNQuant, a calibration-free Vector Quantization (VQ) technique designed for low-bit compression of the KV cache. By applying a three-step transformation--1) a token-wise normalization (Normalize), 2) a channel-wise centering (Shift), and 3) a second token-wise normalization (Normalize)--with Hadamard transform, NSNQuant effectively aligns the token distribution with the standard normal distribution. This alignment enables robust, calibration-free vector quantization using a single reusable codebook. Extensive experiments show that NSNQuant consistently outperforms prior methods in both 1-bit and 2-bit settings, offering strong generalization and up to 3 throughput gain over full-precision baselines.



Hierarchical Probabilistic Principal Component Analysis of Longitudinal Data

arXiv.org Machine Learning

In many longitudinal studies, a large number of variables are measured repeatedly over time, with substantial missing data. Existing methods, such as probabilistic principal component analysis (PPCA), are ill-equipped to handle such incomplete, high-dimensional longitudinal data, as they fail to account for the nested sources of variation and temporal dependency inherent in repeated measures. We introduce hierarchical probabilistic principal component analysis (HPPCA), a two-level probabilistic factor model that explicitly separates between-subject variance from time-varying within-subject dynamics. The within-subject latent factors are modeled by a Gaussian process. We develop an EM algorithm to handle missing data and flexible covariance kernels, accelerated by computationally efficient initializers. Simulation studies demonstrated that HPPCA robustly recovers model parameters subspaces and substantially outperforms both standard PPCA and multivariate functional PCA in imputation accuracy, even under heavy missingness and model misspecification. An application to the long COVID symptoms in the Researching COVID to Enhance Recovery adult cohort revealed that HPPCA effectively captured the data's hierarchical structure and its learned features significantly improved the prediction of clinical outcomes and the recovery of masked clinical records compared to exisiting methods.







A Proofs A.1 Proof of Proposition 1 We first show that for any T T

Neural Information Processing Systems

A.2 Proof of Relation (3) We can write D One class of transport maps we consider in our numerical experiments (i.e., to approximate Another underlying class of transports that we use in our numerical experiments are inverse auto-regressive flows (IAFs). IAFs are built as a composition of component-wise affine transformations, where the shift and scaling functions of each component only depend on earlier indexed variables. Flows are typically comprised of several IAF stages with the components either randomly permuted or, as we choose, reversed in between each stage. Here we discuss how generalized linear models may naturally admit lazy structure. Here we describe the numerical algorithms required by the lazy map framework.


A Q-value convergence We here show that if a tabular agent converges to a policy π in a continuous NDP then Q

Neural Information Processing Systems

See Singh et al. (2000). Moreover, SARSA and Expected SARSA are also both appropriate, if the agent is greedy in the limit. Note that condition 2 requires that the agent takes every action in every state infinitely many times Proof. Let A satisfy the following in a given NDP: A is greedy in the limit, i.e. for all δ > 0, P (Q A's Q-values are accurate in the limit, i.e. if π Then φ has a fixed point. Theorem 3. Every continuous NDP has a strongly ratifiable policy.